The utility of artificially evolved sequences in protein threading and fold recognition.
نویسنده
چکیده
Template-based protein structure prediction plays an important role in Functional Genomics by providing structural models of gene products, which can be utilized by structure-based approaches to function inference. From a systems level perspective, the high structural coverage of gene products in a given organism is critical. Despite continuous efforts towards the development of more sensitive threading approaches, confident structural models cannot be constructed for a considerable fraction of proteins due to difficulties in recognizing low-sequence identity templates with a similar fold to the target. Here we introduce a new modeling stratagem, which employs a library of synthetic sequences to improve template ranking in fold recognition by sequence profile-based methods. We developed a new method for the optimization of generic protein-like amino acid sequences to stabilize the respective structures using a combined empirical scoring function, which is compatible with these commonly used in protein threading and fold recognition. We show that the artificially evolved sequences, whose average sequence identity to the wild-type sequences is as low as 13.8%, have significant capabilities to recognize the correct structures. Importantly, the quality of the corresponding threading alignments is comparable to these constructed using conventional wild-type approaches (the average TM-score is 0.48 and 0.54, respectively). Fold recognition that uses data fusion to combine ranks calculated for both wild-type and synthetic template libraries systematically improves the detection of structural analogs. Depending on the threading algorithm used, it yields on average 4-16% higher recognition rates than using the wild-type template library alone. Synthetic sequences artificially evolved for the template structures provide an orthogonal source of signal that could be exploited to detect these templates unrecognized by standard modeling techniques. It opens up new directions in the development of more sensitive threading methods with the enhanced capabilities of targeting difficult, midnight zone templates.
منابع مشابه
Utility of P19 Gene-Silencing Suppressor for High Level Expression of Recombinant Human Therapeutic Proteins in Plant Cells
Background: The potential of plants, as a safe and eukaryotic system, is considered in the production of recombinant therapeutic human protein today; but the expression level of heterologous proteins is limited by the post-transcriptional gene silencing (PTGS) response in this new technology. The use of viral suppressors of gene silencing can prevent PTGS and improve transient expression level ...
متن کاملAveraging interaction energies over homologs improves protein fold recognition in gapless threading.
Protein structure prediction is limited by the inaccuracy of the simplified energy functions necessary for efficient sorting over many conformations. It was recently suggested (Finkelstein, Phys Rev Lett 1998;80:4823-4825) that these errors can be reduced by energy averaging over a set of homologous sequences. This conclusion is confirmed in this study by testing protein structure recognition i...
متن کاملProtein sequence threading: Averaging over structures.
Multiple sequence alignments are a routine tool in protein fold recognition, but multiple structure alignments are computationally less cooperative. This work describes a method for protein sequence threading and sequence-to-structure alignments that uses multiple aligned structures, the aim being to improve models from protein threading calculations. Sequences are aligned into a field due to c...
متن کاملDesign of the Comprehensive Fold Recognition Benchmark. Application to SeqFold, Training and Validation
Recent exponential increase of protein sequences creates a challenge for automated annotation methods. When sequence based methods (e.g. PSIBLAST [1]) fail to identify a possible homologue (generally below 25% of protein identity i.e. within so-called twilight zone), fold recognition methods offers additional sensitivity [2,4,5,8]. However, training, validating and comparing fold recognition pe...
متن کاملImproving taxonomy-based protein fold recognition by using global and local features.
Fold recognition from amino acid sequences plays an important role in identifying protein structures and functions. The taxonomy-based method, which classifies a query protein into one of the known folds, has been shown very promising for protein fold recognition. However, extracting a set of highly discriminative features from amino acid sequences remains a challenging problem. To address this...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of theoretical biology
دوره 328 شماره
صفحات -
تاریخ انتشار 2013